\documentclass{report}

%%METADATA
\title{README for Replication of \\ \emph{The Earnings and Labor Supply of U.S.~Physicians}}
\author{
  \begin{tabular}{ccc}
    Joshua D.~Gottlieb & & Maria Polyakova \\
    Kevin Rinz & & Hugh Shiplett\\
    \multicolumn{3}{c}{Victoria Udalova}\\
  \end{tabular}
}
\date{December 2024}

%%PACKAGES
\usepackage{graphicx}
\usepackage{grffile}
\usepackage{tabularx}
\usepackage{setspace}
\usepackage{amsmath,amsthm,amssymb}
\usepackage[hyphens]{url}
\usepackage{natbib}
\usepackage[font=normalsize,labelfont=bf]{caption}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{colorlinks=true,urlcolor=blue,citecolor=red}
\usepackage{enumerate}% http://ctan.org/pkg/enumerate %Supports lowercase Roman-letter enumeration
\usepackage{verbatim} %Package with \begin{comment} environment
\usepackage{physics}
\usepackage{tikz}
\usepackage{tikz-3dplot}
\usepackage{tkz-euclide}
\usepackage{pgfplots}
\usepackage{pdflscape}
\pgfplotsset{compat = newest}
\usetikzlibrary{automata,positioning}
\usepgfplotslibrary{external}
\tikzexternalize

\usepackage{listings}
\usepackage{upquote}
\usepackage{booktabs} %Package with \toprule and \bottomrule
\usepackage{etoc}     %Package with \localtableofcontents
\usepackage{multicol}
\usepackage{bm}
\usepackage{bbm}
\usepackage{placeins} %Package with \FloatBarrier
\setlength{\parskip}{0.5em}
\usepackage{subcaption}
\captionsetup{compatibility=false}
\usepackage[T1]{fontenc}
\usepackage{xr}

\usetikzlibrary{math}

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}

\lstset{language=bash,
  frame=tb,
  aboveskip=3mm,
  belowskip=3mm,
  showstringspaces=false,
  columns=flexible,
  basicstyle={\small\ttfamily},
  numbers=none,
  numberstyle=\tiny\color{gray},
  keywordstyle=\color{blue},
  commentstyle=\color{dkgreen},
  stringstyle=\color{mauve},
  breaklines=true,
  breakatwhitespace=false,
  tabsize=3
}

%%FORMATTING
\onehalfspacing
\numberwithin{equation}{section}
\numberwithin{figure}{section}
\numberwithin{table}{section}

\begin{document}

\maketitle

\section*{Overview}
The code for this replication package produces all exhibits in the published paper.
The entire replication package can be run in a single click by typing \texttt{run\_project\_from\_start\_to\_finish} 
into the terminal from within the \texttt{/replication/code} folder. 
This command submits a PBS batch job that runs the do file \texttt{docinc-run\_project.do}, 
which in turn executes all the other Stata and Python scripts for the project. 
As of December 2024, this batch job takes approximately five hours to run with the current server setup.

\section*{Data}
Replicating our work requires access to restricted versions of 
the Centers for Medicare and Medicaid Services' (CMS) 
National Plan and Provider Enumeration System (NPPES) data, 
federal income tax data, ACS data, 
and administrative records from the Social Security Administration (SSA). 
Access to these datasets is governed by strict data use agreements that specify allowed uses of these data.
These data are available only to researchers with Special Sworn Status (SSS) 
working on approved internal Census Bureau projects authorized under Titles 13 and 26. 
All non-Census-held data used in this project are publicly available.
For additional data details, see Appendix B.1.

\section*{Project Structure}
All code is contained within the \texttt{/replication/code} folder. 
Subfolders of \texttt{/replication/code} are listed below. The order corresponds to the order in which the code is run.
\begin{itemize}
    \item \texttt{/cleaning}
    \item \texttt{/descriptives}
    \item \texttt{/twfe}
    \item \texttt{/govt\_policy}
    \item \texttt{/spec\_choice}
    \item \texttt{/pdv}
    \item \texttt{/tuition}
    \item \texttt{/exhibits}
\end{itemize}
Code in each subfolder generates results corresponding to that subfolder's name. 
For instance, the code in \texttt{/twfe} produces our two-way fixed-effects estimates. 
Folders have corresponding \texttt{do} files named after them in the \texttt{/exhibits} folder. 
Taking the previous example, exhibits containing results from \texttt{/twfe} are produced in \texttt{docinc-exhibits\_twfe.do}.

The folders \texttt{/descriptives} through \texttt{/tuition} above produce outputs which, after rounding according to Census disclosure rules, 
are saved in \texttt{/replication/intermediate\_csv}. 
Intermediate outputs such as unrounded estimates and log files are saved to \texttt{/replication/output} or \texttt{/replication/temp}. 
The \texttt{/exhibits} folder produces exhibits using the rounded estimates in \texttt{/intermediate\_csv}, 
saving the figures in \texttt{/figures} and the tables in \texttt{/tables}. 

\section*{Adjusting the code}
Our code assumes the pre-existence of two data files, \texttt{panel\_lawyers.dta} and \texttt{panel\_physicians.dta}, 
that contain income and demographic data for our panels of lawyers and physicians.
These files were created by Census employees, who processed the raw tax data using SAS.
Variable names in our code have been redacted and replaced with intuitive placeholder names (e.g., \texttt{w2wgs}, \texttt{social\_security}, etc.). 
Replicators with access to the original raw tax data will need to map these placeholder names back to their original counterparts.

Replicators may need to adjust file paths based on the location of their data. 
These adjustments should be made in \texttt{docinc-run\_project.do}, which defines file paths using global macros relative to the \texttt{/replication/code} directory.

\section*{Computational requirements}
Our code was run December 2024 using the following software:
\begin{itemize}
    \item Stata 18
    \item Python 3.9.12
    \item Bash shell
    \item LaTeX
\end{itemize}
All dependencies above are available by default on the FSRDC server.  
For all Stata and Python packages, we used the server's default versions.
Custom \texttt{ado} functions to round estimates according to Census disclosure rules 
are stored in the \texttt{/replication/code/ado} folder. 

\end{document}





